A mixed-level switching dynamic system for continuous speech recognition

نویسندگان

  • Jeff Z. Ma
  • Li Deng
چکیده

A two-level mixture linear dynamic system model, with frame-level switching parameters in the observation equation and with segment-level switching parameters in the target-directed state equation, is developed and evaluated. The main contributions of this work are: (1) the new framework for dealing with mixed-level switching in the dynamic system and (2) the novel use of piecewise linear functions, enabled by the introduction of frame-level switching, to approximate the nonlinear function between the hidden vocaltract-resonance space and the observable acoustic space. The approximation is accomplished by the framedependent switching parameters in the observation equation. In this paper, in a self-contained manner, we highlight the key algorithm differences from the earlier model having only single segment-level switching that is synchronous between the state and observation equations. A series of speech recognition experiments are carried out to evaluate this new model using a subset of Switchboard conversational speech data. The experimental results show that the approximation accuracy is improved with an increased number of switching-parameter values. The speech recognizer built from the new mixed-level switching dynamic system model using an N-best re-scoring evaluation paradigm show moderate word error rate reduction compared with using either single-level switching or no switching parameters. 2003 Published by Elsevier Ltd.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition

Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Switching Dynamic System Models for Speech Articulation and Acoustics

A statistical generative model for the speech process is described that embeds a substantially richer structure than the HMM currently in predominant use for automatic speech recognition. This switching dynamic-system model generalizes and integrates the HMM and the piece-wise stationary nonlinear dynamic system (statespace) model. Depending on the level and the nature of the switching in the m...

متن کامل

بهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگی‌های استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز

The design for new feature extraction methods out of the speech signal and combination of their obtained information is one of the most effective approaches to improve the performance of automatic speech recognition (ASR) system. Recent researches have been shown that the speech signal contains nonlinear and chaotic properties, but the effects of these properties are not used in the continuous ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Speech & Language

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2004